Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
深度及时调整(DPT)在大多数自然语言处理〜(NLP)任务中取得了巨大成功。然而,在微调〜(ft)仍然占主导地位的密集检索中,它并没有得到很好的评价。当使用相同的骨干模型〜(例如,Roberta)部署多个检索任务时,基于FT的方法在部署成本方面是不友好的:每个新的检索模型都需要在不重复使用的情况下反复部署骨干模型。为了在这种情况下降低部署成本,这项工作调查了在密集检索中应用DPT。面临的挑战是,直接在密集检索中直接应用DPT在很大程度上表现不佳。为了弥补性能下降,我们建议针对基于DPT的检索器的两种模型不合时宜的和任务不足的策略,即以检索为导向的中间体预处理和统一的负面采矿,作为一种一般方法,可以与任何预先培训的语言模型兼容和检索任务。实验结果表明,所提出的方法(称为DPTDR)在MS-Marco和自然问题上都优于先前的最新模型。我们还进行消融研究以检查每种策略在DPTDR中的有效性。我们认为,这项工作有助于该行业,因为它节省了巨大的部署和成本,并增加了计算资源的实用性。我们的代码可在https://github.com/tangzhy/dptdr上找到。
translated by 谷歌翻译
时空预测是数据科学的急需主题,因为它在智能城市中的多样化和关键应用。现有作品主要对以下步骤进行连续预测,并完全连续地获得观察结果,其中最接近的观测值可以作为瞬时状态估计的关键知识。但是,早期活动计划和传感器失败的实际问题引发了一项全新的任务,即非连续预测。在本文中,我们将缺少观察的时空学习系统定义为灰色时空系统(G2S),并为G2S(FDG2S)提出了一个因子耦合学习框架(FDG2S),其中核心的想法是层次结构上的多层级别,并既可以启用灵活的聚合柔性聚合因子和不确定性估计。首先,为了补偿缺失的观察结果,设计了一个通用的语义邻次序列采样,该采样选择了代表性序列以捕获周期性的规律性和瞬时变化。其次,我们将非连续状态的预测变成了预期的外源性因素下的推断状态。特别是,提出了一个因子耦合的聚合方案,以通过条件随机场的两个能量函数解除因子诱导的预测强度和区域邻近。为了在柔性因子组合和实现动态邻域聚集下推断区域的接近性,我们进一步消除了外源性因素对区域接近性的复合影响,并学会汇总它们。鉴于G2的固有不完整和关键应用,提出了一个不确定性量化,以确定可靠性保证和模型解释的两种类型的不确定性。
translated by 谷歌翻译
本文侧重于培训无限层的隐含模型。具体而言,以前的作品采用隐式差分,并解决后向传播的精确梯度。但是,是否有必要计算训练的这种精确但昂贵的渐变?在这项工作中,我们提出了一种新颖的梯度估计,用于隐式模型,命名为Phantom梯度,1)用于精确梯度的昂贵计算; 2)提供了对隐式模型培训的凭经质优选的更新方向。理论上,理论上可以分析可以找到损失景观的上升方向的条件,并基于阻尼展开和Neumann系列提供幻象梯度的两个特定实例化。大规模任务的实验表明,这些轻质幻像梯度大大加快了培训隐式模型中的后向往大约1.7倍,甚至基于想象成上的精确渐变来提高对方法的性能。
translated by 谷歌翻译
自我监督学习(SSL)在语音识别方面取得了巨大的成功,而有限的探索已尝试完成其他语音处理任务。由于语音信号包含多方面的信息,包括说话者身份,副语言学,口语内容等,学习所有语音任务的通用表示都具有挑战性。为了解决该问题,我们提出了一个新的预培训模型WAVLM,以解决全堆栈的下游语音任务。 Wavlm共同学习了蒙面的语音预测和预训练。通过这种方式,WAVLM不仅可以通过掩盖的语音预测来保持语音内容建模能力,而且还可以通过语音denoing来提高非ASR任务的潜力。此外,WAVLM还采用封闭式的变压器结构的封闭相对位置偏置,以更好地捕获输入语音的序列排序。我们还将培训数据集从60k小时扩展到94K小时。 WAVLM大型在精湛的基准上实现了最先进的性能,并在其代表性基准上为各种语音处理任务带来了重大改进。代码和预培训模型可在https://aka.ms/wavlm上找到。
translated by 谷歌翻译
现代图形神经网络(GNNS)通过多层本地聚合学习节点嵌入,并在各种图形应用中取得巨大成功。但是,对辅音图的任务通常需要非局部聚合。此外,我们发现本地聚合对某些抵消图表甚至有害。在这项工作中,我们提出了一个简单但有效的非本地聚合框架,具有高效的GNN的关注排序。基于它,我们开发各种非本地GNN。我们进行彻底的实验,以分析Disasstative图数据集并评估我们的非本地GNN。实验结果表明,在模型性能和效率方面,我们的非本地GNN在七个基准数据集上显着优于七个基准数据集。
translated by 谷歌翻译
Photonic neural networks are brain-inspired information processing technology using photons instead of electrons to perform artificial intelligence (AI) tasks. However, existing architectures are designed for a single task but fail to multiplex different tasks in parallel within a single monolithic system due to the task competition that deteriorates the model performance. This paper proposes a novel optical multi-task learning system by designing multi-wavelength diffractive deep neural networks (D2NNs) with the joint optimization method. By encoding multi-task inputs into multi-wavelength channels, the system can increase the computing throughput and significantly alle-viate the competition to perform multiple tasks in parallel with high accuracy. We design the two-task and four-task D2NNs with two and four spectral channels, respectively, for classifying different inputs from MNIST, FMNIST, KMNIST, and EMNIST databases. The numerical evaluations demonstrate that, under the same network size, mul-ti-wavelength D2NNs achieve significantly higher classification accuracies for multi-task learning than single-wavelength D2NNs. Furthermore, by increasing the network size, the multi-wavelength D2NNs for simultaneously performing multiple tasks achieve comparable classification accuracies with respect to the individual training of multiple single-wavelength D2NNs to perform tasks separately. Our work paves the way for developing the wave-length-division multiplexing technology to achieve high-throughput neuromorphic photonic computing and more general AI systems to perform multiple tasks in parallel.
translated by 谷歌翻译
基于全注意力的变压器体系结构的强大建模能力通常会导致过度拟合,并且 - 对于自然语言处理任务,导致自动回归变压器解码器中隐式学习的内部语言模型,使外部语言模型的集成变得复杂。在本文中,我们探索了放松的注意力,对注意力的重量进行了简单易于实现的平滑平滑,从编码器。其次,我们表明它自然支持外部语言模型的整合,因为它通过放松解码器中的交叉注意来抑制隐式学习的内部语言模型。我们证明了在几项任务中放松注意力的好处,并与最近的基准方法相结合,并明显改善。具体而言,我们超过了最大的最大公共唇部阅读LRS3基准的26.90%单词错误率的先前最新性能,单词错误率为26.31%,并且我们达到了最佳表现的BLEU分数37.67在IWSLT14(de $ \ rightarrow $ en)的机器翻译任务没有外部语言模型,几乎没有其他模型参数。代码和模型将公开可用。
translated by 谷歌翻译
可进入的模型可以通过在表示理论和特征领域的语言中制定均衡性要求来提供非常通用和灵活的均衡性,这对许多视觉任务都是有效的。但是,由于3D旋转的数学更复杂,因此在2D情况下得出3D旋转模型要困难得多。在这项工作中,我们采用部分差分运算符(PDOS)来模型3D滤波器,并得出了通用的可检测3D CNN,称为PDO-S3DCNNS。我们证明,模棱两可的过滤器受线性约束的约束,可以在各种条件下有效地解决。据我们所知,PDO-S3DCNNS是3D旋转的最通用的CNN,因为它们涵盖了所有$ SO(3)$及其表示的所有常见子组,而现有方法只能应用于特定的组和特定组和表示。广泛的实验表明,我们的模型可以很好地保留在离散域中的均衡性,并且在SHREC'17检索和ISBI 2012分割任务上的表现都超过了以前的网络复杂性。
translated by 谷歌翻译